Skip to content

Conversation

@tiffanychu90
Copy link
Member

@tiffanychu90 tiffanychu90 commented Jan 21, 2026

open data

Bring in columns needed so we can move fully into fct_monthly_scheduled_stops and fct_monthly_routes and change the underlying dataset from publishing a daily sample to everything we observed that month.

TODOs for bridge table in the warehouse

Bridge table can benefit from:

  • handling missing NTD ID or RTPA. Similar issue as when analysis_name is introduced, some versioned key has it, but prior versioned keys don't
  • be more resilient to pick up variations in schedule_gtfs_dataset_name (Trinity Schedule vs Trinity Remix Schedule)

@tiffanychu90 tiffanychu90 changed the title Finish refactoring scripts for Geoportal's ca_transit_stops, ca_transit_routes Refactor open data part 2 Jan 21, 2026
@github-actions
Copy link

nbviewer URLs for impacted notebooks:

1 similar comment
@github-actions
Copy link

nbviewer URLs for impacted notebooks:

@edasmalchi
Copy link
Member

@tiffanychu90 can we keep more of portfolio_utils around for the time being? It's used in HQTA and rt_segment_speeds: https://github.com/search?q=repo%3Acal-itp%2Fdata-analyses+portfolio_utils.standardize_operator_info_for_exports&type=code

@edasmalchi
Copy link
Member

Excited for this though, should I try doing the ArcGIS process starting from these for CA Transit Routes/Stops?

@tiffanychu90
Copy link
Member Author

tiffanychu90 commented Jan 28, 2026

Excited for this though, should I try doing the ArcGIS process starting from these for CA Transit Routes/Stops?

@edasmalchi: Yes, you can start! I think I can merge tomorrow.

Other notes:

  • I only used a certain subset of columns from fct_monthly_scheduled_stops to replicate what was there, but obviously take a look at the columns that would be available now. I tried to make it obvious where adjustments can be made, like setting the columns I read, and also the renaming function, so both scripts follow a similar flow.
  • I also only brought in a subset of columns from the bridge table...also another place to see if there's new columns to include.
  • Since month_first_day is the effective filter now, the 1st of each month is when the last month's data is available in full. I have a line in those models that doesn't add the row until the full month is available.
  • Also noticed that for maybe the Nov 2025 month, one of the Amtrak routes (California Zephyr) looked weird, but just 1 of the 2 shapes present, but seemed to disappear in Dec 2025. I don't think there's a way to detect beyond visual inspection. the gdf.geometry.is_simple was False for both rows, so it'd either drop both or keep both.

@tiffanychu90 tiffanychu90 merged commit 6f225c2 into main Jan 28, 2026
3 checks passed
@tiffanychu90 tiffanychu90 deleted the more-open-data-refactor branch January 28, 2026 18:45
@github-actions
Copy link

nbviewer URLs for impacted notebooks:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants